class: center, middle, inverse, title-slide .title[ # Data Analysis for Policy Research Using R ] .subtitle[ ## Introduction and R Basics ] .author[ ### Harold Stolper ] .institute[ ### Columbia | SIPA ] .date[ ### Spring 2026 ] --- class: center, middle # What is R and how will we use it? --- # What is R? - "an alternative to traditional statistical packages such as SPSS, SAS, and Stata such that it is an extensible, open-source language and computing environment for Windows, Macintosh, UNIX, and Linux platforms." [(ICPSR)](https://www.icpsr.umich.edu/icpsrweb/content/shared/ICPSR/faqs/what-is-r.html) - "an integrated suite of software facilities for data manipulation, calculation and graphical display." ([R-project.org](https://www.r-project.org/about.html)) --- # How will we use R? We'll be using the RStudio interface to work with: - [R scripts](https://cran.r-project.org/doc/contrib/Lemon-kickstart/kr_scrpt.html#:~:text=An%20R%20script%20is%20simply,the%20command%20line%20of%20R.%20() are basically text files that RStudio recognizes as R code. - [R Markdown](https://rmarkdown.rstudio.com/lesson-1.html) files are used in RStudio to "both save and execute code and generate high quality reports that can be shared with an audience." - These lecture slides were created using R Markdown. - Everything you *submit* for this class will be a document generated with R Markdown. - But your workflow should always begin with an R script before writing up your work using R Markdown! --- # What can you do with R+RStudio+RMarkdown? - Data cleaning and manipulation - Statistical and econometric analysis - Data visualization - **Reports and presentations** - **Interactive content and dynamic visualizations** - **Web scraping** - Web applications / dashboards --- class: center, middle ### Here's an example of a dynamic chart: <!-- --> Source: [Federico Tiberti](https://fedetiberti.com/post/animated-plots-gganimate/) --- class: center, middle ### Here's a screenshot of a dynamic website created with R using shiny: <img src="data:image/png;base64,#polluterdash.png" width="80%" /> --- # What will we be doing in this class? We'll be learning how to use R to explore data to inform policy. That means we'll be spending a lot of time working with R, but also thinking about how/when to use methods and concepts from Quant I and II: - **Research design:** understanding how data structure and methods impact analysis and causal inference - **Data wrangling:** cleaning and structuring messy data for analysis - **Exploratory analysis:** identifying and analyzing key variables - **Econometric analysis:** estimating (causal?) relationships between variables to inform policy - **Data visualization & presentation:** conveying findings to your target audience - **Policy writing & interpretation:** translating statistical analysis in accessible terms - **Data for good**: thinking critically about using *data for good* --- class: center, middle # Examples from student project work --- class: center, middle  --- class: center, middle  --- class: center, middle <img src="data:image/png;base64,#nm-ca_chart1.png" width="102%" /> --- class: center, middle <img src="data:image/png;base64,#table3LakuPandai.png" width="75%" /> --- class: center, middle # Introductions --- # Harold Stolper, instructor - Graduated from SIPA many moons ago, returned to Columbia for my PhD in economics. - This is my 11th year teaching quant courses at SIPA. - Previously worked as the economist for a local non-profit doing research and advocacy to promote upward mobility for low-income NYers. - Ongoing research into NYPD subway fare evasion enforcement - My research interests center around access and equity issues related to policing, neighborhood change, public transit, access to education, and more - Transitioned from Stata to R after years of using and teaching Stata. - Some non-academic things I really like include the NBA+WNBA, nail art, and UK drill music. --- # Teaching Assistant - Frankie Michielli --- # Student introductions 1. Preferred name (and pronouns, if comfortable sharing) 1. Previous R exposure/experience 1. Answer one of these two questions: - What's something you would eventually like to learn how to do in R? - What's something that you have observed or think is important that people in your field aren't paying attention to? 1. (optional) One piece of culture you are excited about right now - e.g. music, writing, TV or movies, fashion, a meme, other art, sports, etc. --- class: center, middle # Prerequisites, R setup, course goals --- class: center, middle # Questions for you https://pollev.com/haroldpoll --- class: center, middle # Workflow --- # Working directory R looks for files in your **working directory** - You can see all files located in your working directory in the "Files/Plots/Packages/Help/Viewer" pane -- *For this class, you will always start by creating a new folder for each week and creating an* ***R Project*** *assigned to that folder* --- # R projects -- What is an R project? - An R project is essentially a folder with a corresponding **.rproj** file -- Why R projects? - R projects keep all of your R scripts, documents and data together in one place with its own working directory - This allows you to avoid explicitly specifying the working directory - And ensures that multiple instances of RStudio open at the same time don't interfere with each other --- # R projects How do you set up an R project? 1. Create a new folder, e.g. Lecture1 2. Open RStudio and select `File -> New Project...` from the menu 3. Select `Existing Directory`, navigate to the folder you set up, and click `Create Project` -- After you've created an R project, when you return to your work just double click on the .rproj file to open RStudio with the appropriate working directory. Here is a link with more information on setting up and working with [R projects](https://intro2r.com/rsprojs.html) -- You can double-check your working directory by executing the function `getwd()` ``` r getwd() ``` ``` ## [1] "C:/Users/hbs2103/My Drive/Teaching/U6614-drive/Lectures-drive/Lecture1" ``` --- # Weekly workflow for this class 1. Create a project folder assigned to an .rproj file 1. Download an R script template from the class website (e.g. `lecture2-startclass.R`) and any corresponding data files 1. Write your own R code as you follow along during class 1. After class, you can compare your own R code to the code from class posted to the website (e.g. `lecture2-inclass.R`) 1. Most weeks this code will be the basis for a weekly assignment, due next Monday. You will extend this code and generate a pdf using R Markdown that includes the necessary code and write-up/interpretation. Attend recitation for further assistance with the assignment code and concepts. 1. Once you've written all of the code you need in your R script, you'll turn to the assignment R Markdown file (.rmd). You'll download the assignment questions/template (e.g. `Assignment2-youruni.rmd`), copy *only* the code you need from your R script into *code chunks* in the rmd file, add write-up, *knit* your rmd file to a pdf, and submit via Courseworks. --- # Knitting R Markdown (.rmd) files to PDFs You can use R Markdown to knit different file formats: HTML, PDF, Word. In order to knit to PDF, you need to install a **latex distribution**. We will provide installation instructions via Ed and get you up and running with R Markdown during recitation. --- class: center, middle # R Programming Basics --- # Reference: shortcuts for executing code - R scripts (.R files) - __Cmd/Ctrl + Enter__: execute highlighted line(s) - **Cmd/Ctrl + Shift + Enter** (without highlighting any lines): run entire script - Code chunks in RMarkdown (.rmd files) - **Cmd/Ctrl + Enter**: execute highlighted line(s) within chunk - **Cmd/Ctrl + Shift + k**: "knit" entire document --- # Assignment __Assignment__ means assigning a value/set of values to an "object" - `<-` is the assignment operator - in other languages `=` is the assignment operator - good practice to put a space before and after assignment operator ``` r # Create an object a and assign value a <- 5 a ``` ``` ## [1] 5 ``` ``` r # Create an object b and assign value b <- "I'm so excited to be here in the lab again!" b ``` ``` ## [1] "I'm so excited to be here in the lab again!" ``` Note 1: comments start with a `#` Note 2: R is caps sensitive! --- # Objects and assignment R stores information in objects (like all "object-oriented" programming languages). Some objects: - numbers - character strings - vectors - matrices - lists - functions - plots - data frames (the datasets of R!) --- # Functions Functions do things to different objects. Functions often accept arguments -- we "pass" arguments to functions. -- Functions are also objects themselves that can be "called" to do things like: - calculate and display statistics - generate output - display part or all of objects (e.g. show some data) - manipulate objects (e.g. create a new column of data) - extract information from objects (e.g. the number of rows of data) Base R includes lots of functions. We'll be working with additional packages that include some handy functions. --- # Let's jump in! Our goals for today's R workshop portion of class: --- class: center, middle # R workshop portion of class --- class: center, middle # Assignments & other course responsibilities --- class: center, middle # Assignment 1: submit a knit pdf via CW by 11:59pm next Monday --- # General coding and assignment guidance - Use blank spaces liberally, code is hard to read and spaces help! - Use comments liberally throughout your R script to describe your steps. - Troubleshooting is a skill that will take time to develop! Here are some tips and resources: - Do NOT use AI tools for assignment troubleshooting! Now is the time to learn the R language on your own. - Consult the R script from class and recitation for sample code. - Get used to using R's built in documentation by using `?` - When stuck, Google liberally for examples that work (but not ChatGPT). Focus on finding cpde samples that do something like you want to do, and then adapt to suit your needs. At this stage it's about getting things to work, not mastering all of the ins and outs of the syntax.. - Always execute your code line by line as you go, this makes it easier to isolate the source of any errors and debug your code. --- # Collaboration and academic integrity Students must write up Assignments independently, but you are encouraged to discuss with others along the way. If we come across answers to any parts of any assignments that are clearly not your own words, **all involved parties will receive a 0 and may be referred to Student Affairs if appropriate.** --- # Generative AI *AI-generated code is* ***NOT PERMITTED*** *for weekly assignments.* *AI tools are* ***PERMITTED*** *for your project work in certain instances as long as this use is properly documented.* *Using AI tools to interpret regression results and assess internal or external validity is* ***NOT PERMITTED*** *nor is it constructive.* <br> Students who violate these generative AI policies will be referred to Student Affairs and are subject to the Dean's Discipline Policy and Procedures. --- # Use Ed Discussion for help! - If you need help troubleshooting errors, posting to [Ed Discussion](https://edstem.org/us/courses/89614/discussion) is usually a great place to start (in addition to office hours). - Your post should provide a [reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), including code and a screenshot of the output/error message if applicable. - Don't be bashful about posting for help troubleshooting code or setup issues... if you're running intro trouble, odds are somebody else is too! - A teaching team member will reply soon. You are also encouraged to reply to each others' posts if you have insight about how to resolve the issue (also an easy way to boost your participation!). --- # Pre-class lessons and weekly quizzes Each week, there will be between 1 and 3 html-based lessons for you to work through. Starting next class, every class will begin with a short Courseworks quiz covering the pre-class lessons: - Typically 10 multiple choice questions in total, 5-6 minutes to complete - Will include 1 question on the Data Primer covering data used in class - Quizzes are closed-note (and Courseworks Quizzes shows a log of your quiz activity!) - I try not to emphasize detailed memorization for the quizzes but inevitably there is some - As long as you are thoughtfully engaging with the pre-class material in advance then you're on the right track - I strongly recommend executing the code as you work through each pre-class lesson --- # Attendance and participation **.my-style-ul[You must be able to attend Thursday recitation]** - Early in the semester recitation will be used to review code from class and prep for assignments - Later in the semester this time will be used for extra office hours and project meetings --- # Attendance and participation **Lecture attendance** - Lecture attendance is required, multiple unexcused absences will drive your participation grade down towards zero. **Participation** - .my-style-ul[Participation—in-class and through Ed—is worth 10% of your total grade]. - One of the goals of this course is to be able to talk about data and statistical/econometric methods and coding -- as you would in a job interview or on the job! - We want to create our own data community with engagement from everybody in ways they are comfortable with. Let me know if you have suggestions! --- # Illness and Personal Challenges If you are feeling unwell or dealing with personal challenges, please email me .my-style-ul[in advance of class and/or any looming deadlines]. We will do our best to help you catch up on any missed material make a plan to meet any upcoming deadlines. Being a grad student in today’s world can be a demanding and stressful experience. If you're having a hard time keeping up, let's chat about possible accommodations and teaching team support. --- class: center, middle # Questions about the syllabus and course expectations?